4 research outputs found

    AsterixDB: A Scalable, Open Source BDMS

    Full text link
    AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

    Spatial indexing in the Era of Social Media

    No full text
    The rapid adoption of smart phones and the social media boom has increased the interest in location-based services. A new set of applications and popular online services that utilize users' locations have been created, and many ordinary people are increasingly interacting with these services on a daily basis through their smart phones, tablets, cameras, etc., where most of those gadgets come equipped with GPS sensors. The new complex features provided by those applications and the scale of the massive data handled by them impose new and interesting challenges for spatial databases. In this thesis, we present spatial indexing and query processing techniques in response to some of these challenges. First, we study how to support approximate keyword search on spatial data. There are many popular websites that support keyword search on their spatial data, such as business listings and photos. In these systems, users may experience difficulties finding the entities they are looking for if they do not know their exact spelling, such as the name of a restaurant. We develop three algorithms for constructing a specialized index that can answer location- based approximate keyword queries, successively improving the time and space efficiency by exploiting the textual and spatial properties of the data. We experimentally demonstrate the efficiency of our techniques on real, large datasets. Second, we introduce a framework for converting an in-place update, disk-based data structure to a deferred-update, append-only data structure. We show that converting an R-tree index (and other non-totally ordered index) to an LSM index is non-trivial if the resultant index is expected to have performant read and write operations. Our framework enables the "LSM-ification" of any kind of index structure that supports certain primitive operations, enabling the index to ingest data efficiently. We have implemented our framework in the context of the AsterixDB system as a way to extend both the R-tree and the inverted keyword index to LSM-based indexes. Our results have shown that using an LSM-based version of the R-tree can significantly outperform its conventional counterpart for both ingestion and query speed. Third, we study how to optimize the performance of query workloads that favor recent data. There are many use cases where users of a database system are mostly interested in querying recent data. We propose a solution that exploits the natural partitioning property that LSM-based indexes provide for its components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure including LSM R-trees, and has been implemented in the context of the AsterixDB system. Our experiments show that we can reduce query times by up to 99% for selective range predicates
    corecore